Enterprise Database Systems
Introduction to Statistical Concepts
Core Statistical Concepts: An Overview of Statistics & Sampling
Core Statistical Concepts: Statistics & Sampling with Python
Final Exam: Statistics and Probability

Core Statistical Concepts: An Overview of Statistics & Sampling

Course Number:
it_daitscdj_01_enus
Lesson Objectives

Core Statistical Concepts: An Overview of Statistics & Sampling

  • discover the key concepts covered in this course
  • describe what statistics, populations, and samples are
  • recognize how metrics such as mean, median and mode describe data
  • recall what information is conveyed by measures such as standard deviation and variance
  • summarize the workings a number of probability sampling techniques
  • outline how to create balanced samples from an imbalanced dataset
  • summarize the key concepts covered in this course

Overview/Description
With data now being one of the most valuable assets to tap into, the demand for data science skills increases by the day. Statistics and sampling are at the core of data science. Use this course as a theoretical introduction to using samples to reveal various statistics. Examine what exactly is meant by statistics and samples. Explore descriptive statistics, namely measures of central tendency and of dispersion. Study probability sampling techniques, including simple random sampling and cluster sampling. Investigate how undersampling and oversampling are used to generate more balanced datasets. Upon completion, you'll know the best way to use statistics and samples for your specific goals and needs.

Target

Prerequisites: none

Core Statistical Concepts: Statistics & Sampling with Python

Course Number:
it_daitscdj_02_enus
Lesson Objectives

Core Statistical Concepts: Statistics & Sampling with Python

  • discover the key concepts covered in this course
  • install the latest versions of pandas and visualization modules used to analyze data
  • load data from a CSV file into a pandas DataFrame and perform some initial analysis
  • calculate the mean and median of a distribution using your own function and compare it with the built-in pandas function
  • use Seaborn and Matplotlib to visualize a distribution and where the mean, median, and mode fit in
  • compute and visualize the standard deviation and variance of a distribution
  • implement simple random and stratified sampling on a data frame
  • use pandas to generate a sample using cluster and systematic sampling
  • create a balanced sample using random undersampling and oversampling
  • generate synthetic data in order to create a balanced sample using the Synthetic Minority Over-sampling Technique (SMOTE)
  • summarize the key concepts covered in this course

Overview/Description
Data is one of the most valuable assets a business has, but it's only as valuable as the methods used to interpret it. Data science, which at its core includes statistics and sampling, is the key to data interpretation. In this course, practice using the pandas library in Python to work with statistics and sampling. Practice loading data from a CSV file into a pandas DataFrame. Compute a variety of statistics on data. While doing so, see how to visualize the relationship between data and computed statistics. Moving along, implement several sampling techniques, such as stratified sampling and cluster sampling. Then, explore how a balanced sample can be created from an imbalanced dataset using the imblearn module in Python. Upon completion, you'll be able to generate samples and compute statistics using various tools and methods.

Target

Prerequisites: none

Final Exam: Statistics and Probability

Course Number:
it_feemds_02_enus
Lesson Objectives

Final Exam: Statistics and Probability

  • analyze and visualize data using box plots
  • analyze a uniform distribution by using cumulative distribution and probability density functions
  • apply Poisson distributions to make estimates in real-life situations
  • calculate and visualize confidence intervals using Python
  • calculate joint probabilities associated with the rolling of a die
  • calculate the joint probability of dependent variables
  • calculate the mean and median of a distribution using your own function and compare it with the built-in pandas function
  • calculate the mean and median of a distribution using your own function and compare it with the built-in pandas function
  • compare and contrast type I and type II errors in hypothesis testing
  • compute conditional probabilities
  • create a balanced sample using random undersampling and oversampling
  • create a function to manually perform a T-test
  • create naive Bayes models in Python
  • define a Bayesian model in Python
  • define and understand the Bayes theorem
  • define descriptive and inferential statistics
  • define joint, marginal, and conditional probability
  • define terms such as event, outcome, and experiment
  • define the formula of the expected value of a random variable
  • describe binomial distributions and generate one using SciPy
  • describe different types of probability distributions and where they occur
  • describe normal distributions and their characteristics
  • describe the fundamentals of hypothesis testing
  • describe type I and type II errors
  • describe what statistics, populations, and samples are
  • estimate a population's mean with confidence intervals
  • explain the law of large numbers programmatically
  • explore one-sided and two-sided T-tests
  • explore probabilities associated with a Bayesian model
  • explore the probability tables of nodes in a Bayesian network
  • import python libraries needed to work with probabilities
  • interpret p-values using alpha levels
  • load data from a CSV file into a pandas DataFrame and perform some initial analysis
  • outline the use of one-way ANOVA analysis
  • outline the use of the two-way ANOVA analysis
  • perform the paired T-test on paired samples
  • perform the Wilcoxon signed-rank test to compare medians
  • perform T-tests on real-world data
  • predict values with Bayesian models
  • recall the assumptions of the two-sample T-test
  • recall the symmetrical features of normal distributions
  • recognize how data is distributed using histograms and violin plots
  • recognize how metrics such as mean, median and mode describe data
  • recognize the use of the Mann-Whitney U-test
  • recognize when the Welch’s T-test should be used
  • recount binomial distributions and generate one using SciPy
  • set up null and alternative hypotheses for statistical tests
  • simulate the flipping of a coin in Python
  • simulate the rolling of two die to test joint probability
  • summarize the workings a number of probability sampling techniques
  • test medians using the Wilcoxon signed-rank test
  • use Levene’s test to check for equal variances
  • use Poisson distributions to make estimates in real-life situations
  • use Seaborn and Matplotlib to visualize a distribution and where the mean, median, and mode fit in
  • use the Mann-Whitney U-test
  • use the non-parametric Kruskal-Wallis test
  • use the two-sample T-test to compare means
  • use the Welch’s T-test to compare means
  • use Tukey’s HSD to know which categories differ significantly
  • use two-way ANOVA with interaction between the independent variables

Overview/Description

Final Exam: Statistics and Probability will test your knowledge and application of the topics presented throughout the Statistics and Probability track of the Skillsoft Aspire Essential Math for Data Science Journey.



Target

Prerequisites: none

Close Chat Live